Skip to content

fix(openai_compat): strip leaked Thinking/Final tags from content#1237

Closed
horsley wants to merge 1 commit intosipeed:mainfrom
horsley:fix/openai-compat-thinking-final-strip
Closed

fix(openai_compat): strip leaked Thinking/Final tags from content#1237
horsley wants to merge 1 commit intosipeed:mainfrom
horsley:fix/openai-compat-thinking-final-strip

Conversation

@horsley
Copy link
Copy Markdown
Contributor

@horsley horsley commented Mar 8, 2026

📝 Description

This PR fixes a leak path in openai_compat response parsing where provider-emitted reasoning wrappers (e.g. <think>...</think>) can surface in user-facing content.

Changes:

  • add defensive sanitizer in parseResponse to strip thinking/reasoning blocks
  • strip <final> wrappers while preserving final answer text
  • handle escaped bracket forms (\\u003c/\\u003e)
  • drop trailing unclosed thinking block to avoid reasoning leakage
  • add focused unit tests for normal, escaped, and unclosed-tag cases

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Fixes #1235

📚 Technical Context (Skip for Docs)

🧪 Test Environment

  • Hardware: x86_64 PC
  • OS: Ubuntu Linux (kernel 6.17)
  • Model/Provider: OpenAI-compatible provider parser path (unit tests)
  • Channels: N/A (provider layer)

📸 Evidence (Optional)

Click to view Logs/Screenshots
make check
# => PASS
ok   github.com/sipeed/picoclaw/pkg/providers/openai_compat 0.044s

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

@sipeed-bot sipeed-bot bot added type: bug Something isn't working domain: provider go Pull requests that update go code labels Mar 8, 2026
Copy link
Copy Markdown

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-implemented defensive stripping of leaked thinking/reasoning tags from LLM provider responses.

Review notes:

  1. Three-phase approach -- quick regex check -> strip <final> wrappers -> remove thinking blocks. The quickReasoningTagRE early-exit is a good performance optimization for the common case where no tags are present.

  2. Escaped unicode handling -- converting \u003c/\u003e (and uppercase variants) before regex matching handles providers that double-escape angle brackets in JSON payloads. This is a known issue with some API implementations.

  3. Unclosed thinking block handling -- when inThinking is still true at the end of parsing, the trailing content is dropped. This is the correct behavior: if a thinking block was opened but never closed, everything after it is likely reasoning content.

  4. Regex patterns -- (?i) for case insensitivity is appropriate. The patterns handle variants like <think>, <thinking>, <thought>, <reasoning>, and <final> with optional whitespace and attributes.

  5. Test coverage -- three focused tests covering:

    • Normal thinking+final tags
    • Escaped unicode bracket forms
    • Unclosed thinking blocks
  6. Placement in the pipeline -- applying this in parseResponse (before returning to the caller) is the right layer. This ensures all consumers of the OpenAI-compatible provider get sanitized content regardless of which model/vendor is behind the API.

LGTM.

return &LLMResponse{
Content: choice.Message.Content,
Content: stripThinkingAndFinalTags(choice.Message.Content),
ReasoningContent: choice.Message.ReasoningContent,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to judge whether it is a think mode by whether these contents exist or not. If the content is judged by regular expression, it may misjudge. It is not a think model, but the original text has these keywords in the communication process.

body := strings.NewReader(`{
"choices": [{
"message": {
"content": "<think>internal reasoning</think><final>The answer is 2</final>"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-keyword-starting sentences should be added for testing.


// stripThinkingAndFinalTags removes leaked reasoning blocks while preserving
// user-facing answer text (e.g. <final>answer</final> -> answer).
func stripThinkingAndFinalTags(content string) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a dependent function., it can be placed in the tool class directory as a string tool.

}
}

func TestParseResponse_StripsThinkingAndFinalTags(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put all three tests in one test function, there is no need to split them into so many. A functional test function should be single-point.

@sipeed-bot
Copy link
Copy Markdown

sipeed-bot bot commented Mar 26, 2026

@horsley Hi! This PR has had no activity for over 2 weeks, so I'm closing it for now to keep things tidy. If it's still relevant, feel free to reopen it anytime and we'll pick it back up.

@sipeed-bot sipeed-bot bot closed this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: provider go Pull requests that update go code type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(openai_compat): strip leaked Thinking/Final tags from content

3 participants